{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text to Bounding Box Auto-labeling through TAO Data Services\n",
    "\n",
    "[Grounding DINO](https://arxiv.org/abs/2303.05499) is a state of the art open-set object detection model based on DINO. Grounding DINO can detect arbitrary objects with human inputs such as category names or referring expressions.\n",
    "\n",
    "\n",
    "### Sample prediction of Swin-Base + Grounding DINO model\n",
    "<img align=\"center\" src=\"sample.jpg\" width=\"960\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning Objectives\n",
    "\n",
    "In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:\n",
    "\n",
    "* Take a pretrained model and finetune a Grounding DINO model on COCO dataset\n",
    "* Evaluate the trained model\n",
    "\n",
    "For inference and deployment workflow, please refer to zero-shot inference notboook.\n",
    "\n",
    "## Table of Contents\n",
    "\n",
    "This notebook shows an example usecase of Grounding DINO using Train Adapt Optimize (TAO) Toolkit.\n",
    "\n",
    "0. [Set up env variables and map drives](#head-0)\n",
    "1. [Installing the TAO launcher](#head-1)\n",
    "2. [Prepare dataset and pre-trained model](#head-2)\n",
    "3. [Generate pseudo-boxes with Grounding DINO](#head-3)\n",
    "4. [Visualize pseudo-boxes](#head-4)\n",
    "5. [Convert ODVG dataset to COCO format](#head-5)\n",
    "6. [Generate instance segmentation masks [Optional]](#head-6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Set up env variables and map drives <a class=\"anchor\" id=\"head-0\"></a>\n",
    "\n",
    "When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.\n",
    "\n",
    "The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/mal/`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.\n",
    "\n",
    "The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>\n",
    "\n",
    "`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Please define this local project directory that needs to be mapped to the TAO docker session.\n",
    "os.environ[\"LOCAL_PROJECT_DIR\"] = FIXME\n",
    "\n",
    "os.environ[\"HOST_DATA_DIR\"] = os.path.join(os.getenv(\"LOCAL_PROJECT_DIR\", os.getcwd()), \"data\")\n",
    "os.environ[\"HOST_RESULTS_DIR\"] = os.path.join(os.getenv(\"LOCAL_PROJECT_DIR\", os.getcwd()), \"text2box\", \"results\")\n",
    "\n",
    "# Set this path if you don't run the notebook from the samples directory.\n",
    "# %env NOTEBOOK_ROOT=~/tao-samples/text2box\n",
    "\n",
    "# The sample spec files are present in the same path as the downloaded samples.\n",
    "os.environ[\"HOST_SPECS_DIR\"] = os.path.join(\n",
    "    os.getenv(\"NOTEBOOK_ROOT\", os.getcwd()),\n",
    "    \"specs\"\n",
    ")\n",
    "\n",
    "print(f\"Configuration files are available at {os.environ['HOST_SPECS_DIR']}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! mkdir -p $HOST_DATA_DIR\n",
    "! mkdir -p $HOST_SPECS_DIR\n",
    "! mkdir -p $HOST_RESULTS_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Mapping up the local directories to the TAO docker.\n",
    "import json\n",
    "import os\n",
    "mounts_file = os.path.expanduser(\"~/.tao_mounts.json\")\n",
    "tao_configs = {\n",
    "   \"Mounts\":[\n",
    "         # Mapping the Local project directory\n",
    "        {\n",
    "            \"source\": os.environ[\"LOCAL_PROJECT_DIR\"],\n",
    "            \"destination\": \"/workspace/tao-experiments\"\n",
    "        },\n",
    "       {\n",
    "           \"source\": os.environ[\"HOST_DATA_DIR\"],\n",
    "           \"destination\": \"/data\"\n",
    "       },\n",
    "       {\n",
    "           \"source\": os.environ[\"HOST_SPECS_DIR\"],\n",
    "           \"destination\": \"/specs\"\n",
    "       },\n",
    "       {\n",
    "           \"source\": os.environ[\"HOST_RESULTS_DIR\"],\n",
    "           \"destination\": \"/results\"\n",
    "       },\n",
    "       {\n",
    "           \"source\": \"~/.cache\",\n",
    "           \"destination\": \"/.cache\"\n",
    "       }\n",
    "   ],\n",
    "   \"DockerOptions\": {\n",
    "        \"shm_size\": \"64G\",\n",
    "        \"ulimits\": {\n",
    "            \"memlock\": -1,\n",
    "            \"stack\": 67108864\n",
    "         },\n",
    "        \"user\": \"{}:{}\".format(os.getuid(), os.getgid()),\n",
    "        \"network\": \"host\"\n",
    "   }\n",
    "}\n",
    "# Writing the mounts file.\n",
    "with open(mounts_file, \"w\") as mfile:\n",
    "    json.dump(tao_configs, mfile, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat ~/.tao_mounts.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Installing the TAO launcher <a class=\"anchor\" id=\"head-1\"></a>\n",
    "The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.\n",
    "\n",
    "Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running\n",
    "\n",
    "```sh\n",
    "export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x\n",
    "```\n",
    "where x >= 6 and <= 8\n",
    "\n",
    "We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:\n",
    "* python >=3.7, <=3.10.x\n",
    "* docker-ce > 19.03.5\n",
    "* docker-API 1.40\n",
    "* nvidia-container-toolkit > 1.3.0-1\n",
    "* nvidia-container-runtime > 3.4.0-1\n",
    "* nvidia-docker2 > 2.5.0-1\n",
    "* nvidia-driver > 455+\n",
    "\n",
    "Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below\n",
    "\n",
    "```sh\n",
    "docker login nvcr.io\n",
    "```\n",
    "\n",
    "You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.\n",
    "\n",
    "Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SKIP this step IF you have already installed the TAO launcher.\n",
    "!pip3 install nvidia-pyindex\n",
    "!pip3 install nvidia-tao"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View the versions of the TAO launcher\n",
    "!tao info --verbose"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Prepare dataset and pre-trained model <a class=\"anchor\" id=\"head-2\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Prepare dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " We will be using the COCO dataset for the tutorial. The following script will download COCO dataset automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create local dir\n",
    "!mkdir -p $HOST_DATA_DIR\n",
    "# Download the data\n",
    "!bash $HOST_SPECS_DIR/download_coco.sh $HOST_DATA_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verification\n",
    "!ls -l $HOST_DATA_DIR/raw-data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Download pre-trained model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will download the original Grounding DINO Swin-Base model from GitHub.. For more details about the model, please refer to [https://github.com/IDEA-Research/GroundingDINO/tree/main](https://github.com/IDEA-Research/GroundingDINO/tree/main)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download a checkpoint\n",
    "!mkdir -p $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/\n",
    "!wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/swin_base.pth"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Check that model is downloaded into dir.\")\n",
    "!ls -l $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: The following paths are set from the perspective of the TAO Docker.\n",
    "\n",
    "# The data is saved here\n",
    "%env DATA_DIR = /data\n",
    "%env SPECS_DIR = /specs\n",
    "%env RESULTS_DIR = /results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Generate pseudo-boxes with Grounding DINO <a class=\"anchor\" id=\"head-3\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat $HOST_SPECS_DIR/autolabel.yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"For multi-GPU, change `gpu_ids` in autolabel.yaml based on your machine.\")\n",
    "!tao dataset auto_label generate -e $SPECS_DIR/autolabel.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Visualize pseudo-boxes <a class=\"anchor\" id=\"head-4\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple grid visualizer\n",
    "!pip3 install \"matplotlib>=3.3.3, <4.0\"\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "from math import ceil\n",
    "valid_image_ext = ['.jpg']\n",
    "\n",
    "def visualize_images(output_path, num_cols=4, num_images=10):\n",
    "    num_rows = int(ceil(float(num_images) / float(num_cols)))\n",
    "    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])\n",
    "    f.tight_layout()\n",
    "\n",
    "    a = [os.path.join(output_path, image) for image in os.listdir(output_path) \n",
    "         if os.path.splitext(image)[1].lower() in valid_image_ext]\n",
    "    for idx, img_path in enumerate(a[:num_images]):\n",
    "        col_id = idx % num_cols\n",
    "        row_id = idx // num_cols\n",
    "        img = plt.imread(img_path)\n",
    "        axarr[row_id, col_id].imshow(img) \n",
    "        axarr[row_id, col_id].axis(\"off\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualizing the sample images.\n",
    "IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], \"images_annotated\")\n",
    "COLS = 2 # number of columns in the visualizer grid.\n",
    "IMAGES = 4 # number of images to visualize.\n",
    "\n",
    "visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you find that there are some missing instances from this initial run, you may add another iteration to the `iteration_scheduler`. For closed-set detection, we usually recommend increasing the `conf_threshold` to higher value than your previous iteration. Please note that iterativing auto-labeling for closed-set problem may not always yield the best result compared to phrase grounding task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Convert ODVG dataset to COCO format <a class=\"anchor\" id=\"head-5\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat $HOST_SPECS_DIR/convert.yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert ODVG to COCO\n",
    "!tao dataset annotations convert -e $SPECS_DIR/convert.yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install pycocotools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the converted COCO JSON file can be loaded through pycocotools\n",
    "from pycocotools.coco import COCO\n",
    "\n",
    "c = COCO(os.path.join(os.environ['HOST_RESULTS_DIR'], \"final_annotation.json\"))\n",
    "# Get annotations of only person class\n",
    "anns = c.loadAnns(c.getAnnIds(catIds=[1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TAO also supports subsequently converting the dataset format to KITTI as well. For more information on how to convert the COCO formatted annotations to KITTI, refer to the [Data Services - Annotations](https://docs.nvidia.com/tao/tao-toolkit/text/data_services/annotations.html#annotations) section in the TAO documentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Generate instance segmentation masks [Optional] <a class=\"anchor\" id=\"head-6\"></a>\n",
    "\n",
    "Now that you have generated bounding box annotations using Grounding DINO, you can use these annotations to generate instance segmentation masks using MaskAutoLabel (MAL)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A. Download mask autolabel model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Installing NGC CLI on the local machine.\n",
    "## Download and install\n",
    "%env CLI=ngccli_cat_linux.zip\n",
    "!mkdir -p $LOCAL_PROJECT_DIR/ngccli\n",
    "\n",
    "# Remove any previously existing CLI installations\n",
    "!rm -rf $LOCAL_PROJECT_DIR/ngccli/*\n",
    "!wget \"https://ngc.nvidia.com/downloads/$CLI\" -P $LOCAL_PROJECT_DIR/ngccli\n",
    "!unzip -u \"$LOCAL_PROJECT_DIR/ngccli/$CLI\" -d $LOCAL_PROJECT_DIR/ngccli/\n",
    "!rm -f $LOCAL_PROJECT_DIR/ngccli/*.zip \n",
    "os.environ[\"PATH\"]=\"{}/ngccli/ngc-cli:{}\".format(os.getenv(\"LOCAL_PROJECT_DIR\", \"\"), os.getenv(\"PATH\", \"\")) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List available models\n",
    "!ngc registry model list nvidia/tao/mask_auto_label:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the model\n",
    "!ngc registry model download-version nvidia/tao/mask_auto_label:trainable_v1.0 --dest $LOCAL_PROJECT_DIR/text2box/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Check that model is downloaded into dir.\")\n",
    "!ls -l $LOCAL_PROJECT_DIR/text2box/mask_auto_label_vtrainable_v1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat $HOST_SPECS_DIR/segmentation_autolabel.yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"For multi-GPU, change `gpus` in autolabel.yaml based on your machine.\")\n",
    "!tao dataset auto_label generate -e $SPECS_DIR/segmentation_autolabel.yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Check the pseudo label:\")\n",
    "!ls -l $HOST_RESULTS_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install deps\n",
    "!pip3 install Cython>=0.29.36\n",
    "!pip3 install numpy\n",
    "!pip3 install pillow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import matplotlib\n",
    "matplotlib.use('Agg')\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "from pycocotools.coco import COCO\n",
    "%matplotlib inline\n",
    "\n",
    "image_dir = os.path.join(os.environ[\"HOST_DATA_DIR\"], 'raw-data/val2017')\n",
    "json_path = os.path.join(os.environ[\"HOST_RESULTS_DIR\"], 'final_instance_annotation.json')\n",
    "coco_mal = COCO(annotation_file=json_path)\n",
    "\n",
    "# Restricting to only 5 images that contain atleast one object of category_id 3\n",
    "# for ease of visualization.\n",
    "for i in coco_mal.getImgIds(catIds=[3])[:5]:\n",
    "    img_info = coco_mal.loadImgs(i)[0]\n",
    "    img_file_name = img_info[\"file_name\"]\n",
    "    print(img_file_name)\n",
    "    ann_ids = coco_mal.getAnnIds(imgIds=[i], iscrowd=None)\n",
    "    anns = coco_mal.loadAnns(ann_ids)\n",
    "    # raw image\n",
    "    im = Image.open(os.path.join(image_dir, img_file_name))\n",
    "    # plots\n",
    "    fig = plt.figure(figsize = (10,10))\n",
    "    ax1 = fig.add_subplot(211)\n",
    "    ax1.imshow(np.asarray(im))\n",
    "    ax2 = fig.add_subplot(212)\n",
    "    ax2.imshow(np.asarray(im), aspect='auto')\n",
    "    coco_mal.showAnns(anns, draw_bbox=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You have now successfully used GroundingDINO and MAL to generate an open vocabulary object detection and instance segmentation dataset. You may use this dataset to train any object detection or instance segmentation network in TAO."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tao_launcher",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
